library(tidyverse)
library(gapminder)
library(maps)
library(WDI)
(df <- gapminder)
asean <- c("Brunei", "Cambodia", "Laos", "Myanmar", "Philippines", "Indonesia", "Malaysia", "Singapore")
df %>% filter(country %in% asean) %>%
ggplot(aes(x = year, y = gdpPercap, col = country)) + geom_line()
df %>% filter(country %in% asean) %>%
ggplot(aes(x = gdpPercap, y = lifeExp, col = country)) + geom_point()
df %>% filter(country %in% asean) %>%
ggplot(aes(x = gdpPercap, y = lifeExp, col = country)) +
geom_point() + coord_trans(x = "log10", y = "identity")
\(\log_{10}{100}\) = 2, \(\log_{10}{1000}\) = 3, \(\log_{10}{10000}\) = 4
\(10^{2.5}\) = 316.227766, \(10^{3}\) = 1000, \(10^{3.5}\) = 3162.2776602,
\(10^{4}\) = 10^{4}, \(10^{4.5}\) = 3.1622777^{4}.
df_wdi <- WDI(
country = "all",
indicator = c(lifeExp = "SP.DYN.LE00.IN", pop = "SP.POP.TOTL", gdpPercap = "NY.GDP.PCAP.KD")
)
df_wdi
df_wdi_extra <- WDI(
country = "all",
indicator = c(lifeExp = "SP.DYN.LE00.IN", pop = "SP.POP.TOTL", gdpPercap = "NY.GDP.PCAP.KD"),
extra = TRUE
)
df_wdi_extra
EDA is an iterative cycle that helps you understand what your data says. When you do EDA, you:
Generate questions about your data
Search for answers by visualising, transforming, and/or modeling your data
Use what you learn to refine your questions and/or generate new questions
EDA is an important part of any data analysis. You can use EDA to make discoveries about the world; or you can use EDA to ensure the quality of your data, asking questions about whether the data meets your standards or not.
The term ``Open Data’’ has a very precise meaning. Data or content is open if anyone is free to use, re-use or redistribute it, subject at most to measures that preserve provenance and openness.
WDI(country = "all",
indicator = "NY.GDP.PCAP.KD",
start = 1960,
end = 2020,
extra = FALSE,
cache = NULL)
c('women_private_sector' = 'BI.PWK.PRVS.FE.ZS')library(WDI)
WDIsearch(string = "NY.GDP.PCAP.KD",
field = "indicator", cache = NULL)
WDIsearch(string = "population",
field = "name", short=FALSE, cache = wdi_cache)
WDIsearch(string = "NY.GDP.PCAP.KD",
field = "indicator", short = FALSE, cache = NULL)
WDIsearch(string = "gdp",
field = "name", short = TRUE, cache = NULL)
WDIbulk downloads the zip file of Bulk Downloads in WDI site , it is a list containing 6 data frames: Data, Country, Series, Country-Series, Series-Time, FootNote.
timeout: integer maximum number of seconds to wait for
download
```r
wdi$FootNote
<!-- rnb-source-end -->
<!-- rnb-chunk-end -->
<!-- rnb-text-begin -->
---
#### Bulk Downloads: Data
<!-- rnb-text-end -->
<!-- rnb-chunk-begin -->
<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxud2RpIDwtIFdESWJ1bGsodGltZW91dCA9IDYwMClcbmBgYCJ9 -->
```r
wdi <- WDIbulk(timeout = 600)
trying URL 'https://databank.worldbank.org/data/download/WDI_csv.zip'
Content type 'application/x-zip-compressed' length 70064539 bytes (66.8 MB)
==================================================
downloaded 66.8 MB
<!-- rnb-source-end -->
<!-- rnb-output-begin eyJkYXRhIjoiRXJyb3I6IGF0dGVtcHQgdG8gdXNlIHplcm8tbGVuZ3RoIHZhcmlhYmxlIG5hbWVcbiJ9 -->
Error: attempt to use zero-length variable name
<!-- rnb-output-end -->
<!-- rnb-chunk-end -->
<!-- rnb-text-begin -->
---
#### Bulk Downloads: Country
<!-- rnb-text-end -->
<!-- rnb-chunk-begin -->
<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxud2RpX2NhY2hlXG5gYGBcbmBgYCJ9 -->
```r
```r
wdi_cache
<!-- rnb-source-end -->
<!-- rnb-chunk-end -->
<!-- rnb-text-begin -->
---
#### Bulk Downloads: Series
<!-- rnb-text-end -->
<!-- rnb-chunk-begin -->
<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxuZ2xpbXBzZShXRElfZGF0YSlcbmBgYFxuYGBgIn0= -->
```r
```r
glimpse(WDI_data)
<!-- rnb-source-end -->
<!-- rnb-chunk-end -->
<!-- rnb-text-begin -->
---
#### Bulk Downloads: Country-Series
<!-- rnb-text-end -->
<!-- rnb-chunk-begin -->
<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxuV0RJX2RhdGEkc2VyaWVzXG5gYGBcbmBgYCJ9 -->
```r
```r
WDI_data$series
<!-- rnb-source-end -->
<!-- rnb-chunk-end -->
<!-- rnb-text-begin -->
---
#### Bulk Downloads: Series-Time
<!-- rnb-text-end -->
<!-- rnb-chunk-begin -->
<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxuV0RJX2RhdGEkY291bnRyeVxuYGBgXG5gYGAifQ== -->
```r
```r
WDI_data$country
<!-- rnb-source-end -->
<!-- rnb-chunk-end -->
<!-- rnb-text-begin -->
---
#### Bulk Downloads: Footnote
<!-- rnb-text-end -->
<!-- rnb-chunk-begin -->
<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxuV0RJX2RhdGEkY291bnRyeSAgJT4lIGZpbHRlcihjb3VudHJ5ID09IFxcSmFwYW5cXClcbmBgYFxuYGBgIn0= -->
```r
```r
WDI_data$country %>% filter(country == \Japan\)
<!-- rnb-source-end -->
<!-- rnb-chunk-end -->
<!-- rnb-text-begin -->
---
### WDIcache
Download an updated list of available WDI indicators from the World Bank website. Returns a list for use in the WDIsearch function.
<!-- rnb-text-end -->
<!-- rnb-chunk-begin -->
<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxuV0RJc2VhcmNoKHN0cmluZyA9IFxcZ2RwXFwsIFxuICBmaWVsZCA9IFxcbmFtZVxcLCBzaG9ydCA9IEZBTFNFLCBjYWNoZSA9IHdkaV9jYWNoZSkgXG5gYGBcbmBgYCJ9 -->
```r
```r
WDIsearch(string = \gdp\,
field = \name\, short = FALSE, cache = wdi_cache)
<!-- rnb-source-end -->
<!-- rnb-chunk-end -->
<!-- rnb-text-begin -->
Downloading all series information from the World Bank website can take time. The WDI package ships with a local data object with information on all the series available on 2012-06-18. You can update this database by retrieving a new list using `WDIcache`, and then feeding the resulting object to `WDIsearch` via the cache argument.
---
<!-- rnb-text-end -->
<!-- rnb-chunk-begin -->
<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxuY28ycGNhcCA8LSBXREkoY291bnRyeSA9IFxcYWxsXFwsIGluZGljYXRvciA9IFxcRU4uQVRNLkNPMkUuUENcXCwgc3RhcnQgPSAxOTYwLCBlbmQgPSBOVUxMLCBleHRyYSA9IFRSVUUsIGNhY2hlID0gd2RpX2NhY2hlKVxuYGBgXG5gYGAifQ== -->
```r
```r
co2pcap <- WDI(country = \all\, indicator = \EN.ATM.CO2E.PC\, start = 1960, end = NULL, extra = TRUE, cache = wdi_cache)
<!-- rnb-source-end -->
<!-- rnb-chunk-end -->
<!-- rnb-text-begin -->
---
### WDI_data
List of 2 data frames
The first character matrix includes a full list of WDI series. This list is updated semi-regularly. Users can refresh the list manually using the 'WDIcache()' function and search in the updated list using the 'cache' argument.
<!-- rnb-text-end -->
<!-- rnb-chunk-begin -->
<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxuY28ycGNhcFxuYGBgXG5gYGAifQ== -->
```r
```r
co2pcap
```
Find indicators:
WDIsearch(string = "gdp", field = "name", short = FALSE, cache = NULL)Indicator: EN.ATM.CO2E.PC
readr, readxlreadr, ggplot2; Public Data, WDI, WIR,
etc
EDA is an iterative cycle that helps you understand what your data says. When you do EDA, you:
Generate questions about your data
Search for answers by visualising, transforming, and/or modeling your data
Use what you learn to refine your questions and/or generate new questions
EDA is an important part of any data analysis. You can use EDA to make discoveries about the world; or you can use EDA to ensure the quality of your data, asking questions about whether the data meets your standards or not.
There is no rule about which questions you should ask to guide your research. However, two types of questions will always be useful for making discoveries within your data. You can loosely word these questions as:
The rest of this tutorial will look at these two questions. To make the discussion easier, let’s define some terms…
ggplot2 Basicsvisualization
ggplot2 Extra